2,070 research outputs found
CryptGraph: Privacy Preserving Graph Analytics on Encrypted Graph
Many graph mining and analysis services have been deployed on the cloud,
which can alleviate users from the burden of implementing and maintaining graph
algorithms. However, putting graph analytics on the cloud can invade users'
privacy. To solve this problem, we propose CryptGraph, which runs graph
analytics on encrypted graph to preserve the privacy of both users' graph data
and the analytic results. In CryptGraph, users encrypt their graphs before
uploading them to the cloud. The cloud runs graph analysis on the encrypted
graphs and obtains results which are also in encrypted form that the cloud
cannot decipher. During the process of computing, the encrypted graphs are
never decrypted on the cloud side. The encrypted results are sent back to users
and users perform the decryption to obtain the plaintext results. In this
process, users' graphs and the analytics results are both encrypted and the
cloud knows neither of them. Thereby, users' privacy can be strongly protected.
Meanwhile, with the help of homomorphic encryption, the results analyzed from
the encrypted graphs are guaranteed to be correct. In this paper, we present
how to encrypt a graph using homomorphic encryption and how to query the
structure of an encrypted graph by computing polynomials. To solve the problem
that certain operations are not executable on encrypted graphs, we propose hard
computation outsourcing to seek help from users. Using two graph algorithms as
examples, we show how to apply our methods to perform analytics on encrypted
graphs. Experiments on two datasets demonstrate the correctness and feasibility
of our methods
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: A Sure Screening Approach
We propose a novel application of the Simultaneous Orthogonal Matching
Pursuit (S-OMP) procedure for sparsistant variable selection in ultra-high
dimensional multi-task regression problems. Screening of variables, as
introduced in \cite{fan08sis}, is an efficient and highly scalable way to
remove many irrelevant variables from the set of all variables, while retaining
all the relevant variables. S-OMP can be applied to problems with hundreds of
thousands of variables and once the number of variables is reduced to a
manageable size, a more computationally demanding procedure can be used to
identify the relevant variables for each of the regression outputs. To our
knowledge, this is the first attempt to utilize relatedness of multiple outputs
to perform fast screening of relevant variables. As our main theoretical
contribution, we prove that, asymptotically, S-OMP is guaranteed to reduce an
ultra-high number of variables to below the sample size without losing true
relevant variables. We also provide formal evidence that a modified Bayesian
information criterion (BIC) can be used to efficiently determine the number of
iterations in S-OMP. We further provide empirical evidence on the benefit of
variable selection using multiple regression outputs jointly, as opposed to
performing variable selection for each output separately. The finite sample
performance of S-OMP is demonstrated on extensive simulation studies, and on a
genetic association mapping problem. Adaptive Lasso; Greedy forward
regression; Orthogonal matching pursuit; Multi-output regression; Multi-task
learning; Simultaneous orthogonal matching pursuit; Sure screening; Variable
selectio
Integrating Document Clustering and Topic Modeling
Document clustering and topic modeling are two closely related tasks which
can mutually benefit each other. Topic modeling can project documents into a
topic space which facilitates effective document clustering. Cluster labels
discovered by document clustering can be incorporated into topic models to
extract local topics specific to each cluster and global topics shared by all
clusters. In this paper, we propose a multi-grain clustering topic model
(MGCTM) which integrates document clustering and topic modeling into a unified
framework and jointly performs the two tasks to achieve the overall best
performance. Our model tightly couples two components: a mixture component used
for discovering latent groups in document collection and a topic model
component used for mining multi-grain topics including local topics specific to
each cluster and global topics shared across clusters.We employ variational
inference to approximate the posterior of hidden variables and learn model
parameters. Experiments on two datasets demonstrate the effectiveness of our
model.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
- …